Skip to content

nvme: downgrade WARN in nvme_setup_rw to pr_debug#772

Open
blktests-ci[bot] wants to merge 2 commits intolinus-master_basefrom
series/1085752=>linus-master
Open

nvme: downgrade WARN in nvme_setup_rw to pr_debug#772
blktests-ci[bot] wants to merge 2 commits intolinus-master_basefrom
series/1085752=>linus-master

Conversation

@blktests-ci
Copy link
Copy Markdown

@blktests-ci blktests-ci Bot commented Apr 27, 2026

Pull request for series with
subject: nvme: downgrade WARN in nvme_setup_rw to pr_debug
version: 1
url: https://patchwork.kernel.org/project/linux-block/list/?series=1085752

When an NVMe namespace is configured with embedded metadata (flbas bit 4
set, NVME_NS_FLBAS_META_EXT) but no Protection Information (dps=0) and
no NVME_NS_METADATA_SUPPORTED, nvme_setup_rw() fires WARN_ON_ONCE on
any request that reaches it with REQ_INTEGRITY unset.  The WARN was
observed repeatedly during NVMe fuzz testing with a FEMU-based fuzzer
that performs semantic mutation of Identify Namespace responses.

The trigger requires three conditions to align: (a) a namespace
transitions through the EXT_LBAS non-PI state (head->ms != 0,
features & NVME_NS_EXT_LBAS, !(features & NVME_NS_METADATA_SUPPORTED)),
(b) nvme_init_integrity() returns false through the early-exit branch
at core.c:1834 without populating bi->metadata_size, leaving the disk
without an integrity profile (blk_get_integrity() returns NULL), and
(c) a request that was admitted to the block layer before the namespace
update reaches nvme_setup_rw() after it.

The admission gap arises in two places.  First, the plug-list flush
path: a process with dirty pages queued in a plug before the namespace
update flushes them on file close (blk_finish_plug -> blk_mq_dispatch
-> nvme_setup_rw), bypassing any capacity-zero gate.  Second, the
cached-rq path: blk_mq_submit_bio() at blk-mq.c:3155 may find a cached
request; if so, the bio_queue_enter() freeze-serialization guard at
blk-mq.c:3174-3176 is skipped and the bio is dispatched immediately.

In both cases the bio was submitted without REQ_INTEGRITY (because
blk_get_integrity() returned NULL at dispatch time, so
bio_integrity_action() returned 0 and bio_integrity_prep() was not
called), and it reaches nvme_setup_rw() for a namespace where
head->ms != 0.  The existing BLK_STS_NOTSUPP return correctly handles
this dispatch; the WARN_ON_ONCE is a false positive.

The WARN was reproduced six times over four days of fuzzing (April
2026).  A representative crash shows the plug-flush path:

  nvme0n1: detected capacity change from 2097152 to 0
  WARNING: drivers/nvme/host/core.c:1042 at nvme_setup_rw+0x768/0xfd0
  PID: 785 (systemd-udevd)
  Call Trace:
   nvme_setup_cmd / nvme_queue_rq / blk_mq_dispatch_rq_list
   blk_mq_flush_plug_list / blk_finish_plug / blkdev_writepages
   sync_blockdev / bdev_release / __fput / sys_close

Replace WARN_ON_ONCE with pr_debug_ratelimited so the condition is
logged at debug level without splat.  The BLK_STS_NOTSUPP return is
preserved; I/O to the transitioning namespace is still rejected.

An alternative approach that addresses the root cause at the
integrity-profile level is proposed in patch 2/2: populate
bi->metadata_size for EXT_LBAS non-PI namespaces in nvme_init_integrity()
so that bio_integrity_action() returns non-zero, bio_integrity_prep()
sets REQ_INTEGRITY, and nvme_setup_rw() never reaches this branch.
Both patches are sent as RFC for maintainer guidance on the preferred
direction.

Tested: Compiled on linux-kcov-debug (6.19.0+, KASAN/DEBUG_LIST).
Boot-tested under FEMU with NVME_MALICIOUS_RESPONDER=1
NVME_SEMANTIC_DATA_MUTATOR=1; ran 4 concurrent dd processes plus 500
rescan_controller cycles.  No WARN, BUG, or Oops observed.

Found by FuzzNvme(Syzkaller with FEMU fuzzing framework).

Acked-by: Sungwoo Kim <[email protected]>
Acked-by: Dave Tian <[email protected]>
Acked-by: Weidong Zhu <[email protected]>
Signed-off-by: Chao Shi <[email protected]>
This patch is an alternative to patch 1/2: instead of downgrading the
assertion in nvme_setup_rw(), it addresses the root cause at the
integrity-profile level so that the assertion is never reached.

For PCIe namespaces with extended LBAs (NVME_NS_EXT_LBAS set, flbas
bit 4) but without PI and without NVME_NS_METADATA_SUPPORTED, the early-
exit branch of nvme_init_integrity() at core.c:1834 returns false
without populating bi->metadata_size.  As a result blk_get_integrity()
returns NULL (it checks q->limits.integrity.metadata_size via
blk_integrity_queue_supports_integrity()), bio_integrity_action() returns
0, bio_integrity_prep() is never called, and REQ_INTEGRITY is never set
on bios dispatched to the namespace.  Any such bio that reaches
nvme_setup_rw() triggers WARN_ON_ONCE because head->ms != 0 but
blk_integrity_rq() returns false.

Populate bi->metadata_size = head->ms in the early-exit path for the
EXT_LBAS non-PI case.  This is sufficient to make blk_get_integrity()
return non-NULL, which causes bio_integrity_action() to return non-zero,
which causes bio_integrity_prep() to run and set REQ_INTEGRITY on any
bio submitted to the namespace.  Requests that reach nvme_setup_rw()
then satisfy blk_integrity_rq() and the assertion is not reached.

blk_validate_integrity_limits() accepts this configuration: with
csum_type=BLK_INTEGRITY_CSUM_NONE, pi_tuple_size=0, and pi_offset=0,
all checks pass (pi_offset + pi_tuple_size <= metadata_size, pi_tuple_size
must be 0 for CSUM_NONE), and interval_exp is auto-filled to
ilog2(logical_block_size).  No generate/verify callbacks are configured,
so no actual integrity computation occurs; only the blk_integrity_rq()
predicate is satisfied.  Capacity is still forced to 0 by
set_capacity_and_notify(), so new bios are rejected by bio_check_eod()
before queue entry.

Tested: Compiled on linux-kcov-debug (6.19.0+, KASAN/DEBUG_LIST).
Boot-tested under FEMU with NVME_SEMANTIC_DATA_MUTATOR=1; ran 4
concurrent dd processes plus 500 rescan_controller cycles with no WARN,
BUG, or Oops.  The EXT_LBAS + ms!=0 + !PI combination was not triggered
during testing (FEMU's mutator varies flbas and lbaf[0].ms independently;
flbas=0x10 with lbaf_idx=0 was not produced in this run).  The
bi->metadata_size assignment path was not exercised in testing;
correctness of blk_validate_integrity_limits() for this configuration
was verified by code inspection.  Provided as RFC.

Found by FuzzNvme(Syzkaller with FEMU fuzzing framework).

Acked-by: Sungwoo Kim <[email protected]>
Acked-by: Dave Tian <[email protected]>
Acked-by: Weidong Zhu <[email protected]>
Signed-off-by: Chao Shi <[email protected]>
@blktests-ci
Copy link
Copy Markdown
Author

blktests-ci Bot commented Apr 27, 2026

Upstream branch: dd6c438
series: https://patchwork.kernel.org/project/linux-block/list/?series=1085752
version: 1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant